Identifying Syntactic Role of Antecedent in Korean Relative Clause Using Corpus and Thesaurus Information
نویسندگان
چکیده
This paper describes an approach to identifying the syntactic role of an antecedent in a Korean relative clause, which is essential to structural disambiguation and semantic analysis. In a learning phase, linguistic knowledge such as conceptual co-occurrence patterns and syntactic role distribution of antecedents is extracted from a large-scale corpus. Then, in an application phase, the extracted knowledge is applied in determining the correct syntactic role of an antecedent in relative clauses. Unlike previous research based on co-occurrence patterns at the lexical level, we represent co-occurrence patterns with concept types in a thesaurus. In an experiment, the proposed method showed a high accuracy rate of 90.4% in resolving ambiguitie s of syntactic role determination of antecedents. 1 I n t r o d u c t i o n A relative clause is the one that modifies an antecedent in a sentence. To determine the syntactic role of the antecedent in a verb argument structure of relative clause is important in parsing and structural disambiguation(Li et al., 1998). While applying case frames of a verb for structural disambiguation, identifying the role of antecedent will affect the correctness of structural disambiguation impressively. In this paper, we will describe a method of identifying the syntactic role of antecedents, which consists of two phases. First, in the learning phase, conceptual patterns (CPs) and syntactic role distribution of antecedents are extracted from a corpus of 6 million words, the Korean Language Information Base (KLIB). The conceptual patterns reflect the possible case restriction of a verb with concept types, while the syntactic role distribution shows the preference of syntactic role of antecedents of a verb. Second, in the application phase, the syntactic role of an antecedent is decided using CPs and the syntactic role distribution. In regards to the rest of this paper, Section 2 will review the problems and related work. Section 3 will describe a statistical approach of conceptual pattern extraction from a large corpus as knowledge for determining syntactic roles. Section 4 will describe how to identify syntactic roles using conceptual patterns and syntactic role distribution of antecedents in the corpus. Section 5 will then present an experimental evaluation of the method. The last section makes a conclusion with some discussion. The Yale Romanization is used to represent Korean expressions. 2 P r o b l e m s a n d R e l a t e d W o r k In English, it is possible to recognize the syntactic role of antecedents by their position (trace) in relative clauses and the valency information of verbs. For example, the syntactic role of an antecedent man can be recognized as subject of the relative clause in a sentence "He is the man who lives next door" and as object in a sentence "He is the man whom I met." The relative pronouns such as who, whom, that, whose, and which can also be used in identifying the role of antecedents in relative clauses. However, it is not a trivial work to identify the syntactic role of antecedents in Korean relative clauses. Korean is such a head final language that the antecedent comes after the relative clause. The rest of this section will describe three main characteristics of Korean relative clauses that make it difficult to determine the syntactic role of their antecedents. The first cha rac t e r i s t i c is that unlike English, Korean lacks relative words corresponding to English
منابع مشابه
Semantic Priming Effect on Relative Clause Attachment Ambiguity Resolution in L2
This study examined whether processing ambiguous sentences containing relative clauses (RCs) following a complex determiner phrase (DP) by Persian-speaking learners of L2 English with different proficiency and working memory capacities (WMCs) is affected by semantic priming. The semantic relationship studied was one between the subject/verb of the main clause and one of the DPs in the complex D...
متن کاملRelative Clause Ambiguity Resolution in L1 and L2: Are Processing Strategies Transferred?
This study aims at investigating whether Persian native speakers highly advanced in English as a second language (L2ers) can switch to optimal processing strategies in the languages they know and whether working memory capacity (WMC) plays a role in this respect. To this end, using a self-paced reading task, we examined the processing strategies 62 Persian speaking proficient L2ers used to read...
متن کاملRobust clause boundary identification for corpus annotation
The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it ta...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملProsodic Disambiguation of Syntactic Clause Boundaries in Korean
This study explored the effects of prosodic boundaries on the comprehension of temporarily and globally ambiguous sentences in Korean. Previous work on English, Japanese, and Korean has demonstrated that prosodic structures can carry critical information about the meaning and/or syntactic structures of spoken sentences (Kjelgaard and Speer 1999; Jun and Oh 1996; Misono et al. 1997; Venditti 199...
متن کامل